Short Text Topic Modeling Techniques, Applications, and Performance: A Survey
نویسندگان
چکیده
Analyzing short texts infers discriminative and coherent latent topics that is a critical fundamental task since many real-world applications require semantic understanding of texts. Traditional long text topic modeling algorithms (e.g., PLSA LDA) based on word co-occurrences cannot solve this problem very well only limited co-occurrence information available in Therefore, has already attracted much attention from the machine learning research community recent years, which aims at overcoming sparseness In survey, we conduct comprehensive review various techniques proposed literature. We present three categories methods Dirichlet multinomial mixture, global co-occurrences, self-aggregation, with example representative approaches each category analysis their performance tasks. develop first open-source library, called STTM, for use Java integrates all surveyed within unified interface, benchmark datasets, to facilitate expansion new field. Finally, evaluate these state-of-the-art datasets compare against one another versus algorithm.
منابع مشابه
Topic Modeling and Classification of Cyberspace Papers Using Text Mining
The global cyberspace networks provide individuals with platforms to can interact, exchange ideas, share information, provide social support, conduct business, create artistic media, play games, engage in political discussions, and many more. The term cyberspace has become a conventional means to describe anything associated with the Internet and the diverse Internet culture. In fact, cyberspac...
متن کاملShort and Sparse Text Topic Modeling via Self-Aggregation
The overwhelming amount of short text data on social media and elsewhere has posed great challenges to topic modeling due to the sparsity problem. Most existing attempts to alleviate this problem resort to heuristic strategies to aggregate short texts into pseudo-documents before the application of standard topic modeling. Although such strategies cannot be well generalized to more general genr...
متن کاملA Survey of Text Mining Techniques and Applications
Text Mining has become an important research area. Text Mining is the discovery by computer of new, previously unknown information, by automatically extracting information from different written resources. In this paper, a Survey of Text Mining techniques and applications have been s presented.
متن کاملTopic modeling for OLAP on multidimensional text databases: topic cube and its applications
As the amount of textual information grows explosively in various kinds of business systems, it becomes more and more desirable to analyze both structured data records and unstructured text data simultaneously. While online analytical processing (OLAP) techniques have been proven very useful for analyzing and mining structured data, they face challenges in handling text data. On the other hand,...
متن کاملSurvey of Text Mining Techniques, Challenges and their Applications
In our everyday life communication interaction among people leading to mutual learning and sharing of valuable knowledge, such as chat, messaging, comments, and posts on board etc. Also, social networking websites, search engines sharing huge data texts in websites. The text is nothing but the combination of characters. Therefore, analyzing and extracting information patterns from such data set...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
ژورنال
عنوان ژورنال: IEEE Transactions on Knowledge and Data Engineering
سال: 2022
ISSN: ['1558-2191', '1041-4347', '2326-3865']
DOI: https://doi.org/10.1109/tkde.2020.2992485